home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-04-05 | 16.3 KB | 413 lines | [TEXT/KEEN] |
- Debugging hAWK programs
- -------------------
-
- Introduction
- It doesn't run
- It doesn't do what I wanted
- Common bugs
-
- ---------
- Introduction
- ---------
- Errors can creep in at the specification, design, or coding stage of any
- program, in any language. Symptoms of a error can range from a vague
- uneasiness about the results to seemingly random crashes. In C, one of the
- most difficult tasks of debugging is to stabilize a bug so that it can be
- repeated consistently; fortunately, in hAWK this isn't a problem, since it
- doesn't allow writing to an arbitrary memory address. So for hAWK
- programs, your tasks are to find where the bug is, and fix it.
-
- This is mainly a guide to finding where the bug is in your source code, with a
- brief list of common bugs. When it's not obvious where the bug is, your two
- best weapons are; insert "print" statements to give you some idea of what's
- going on, and selectively comment out lines of source to isolate the problem.
- But as you gain experience writing hAWK programs you'll naturally find
- yourself avoiding the common bugs, and catching the others with careful
- proofreading.
-
- Develop and test in small pieces. Follow a plan. Don't get mad, get critical.
- Question everything, including this cheap advice.
-
- ---------
- It doesn't run
- ---------
- If your program doesn't start running due to a syntax error, the
- "$tempStdErr" file will contain a message telling you the line number
- where the error was detected. Normally this line number will be exactly
- where the error is, or at most a couple of lines after the real mistake.
-
- Most syntax errors will be easy to spot and fix—missing punctuation such
- as brackets or quotes for example. One oddball error that may be difficult to
- diagnose is the insidious missing "#", as in
- #$Calculate: a four function calulator.
- #Enter expressions using numbers and + - * /.
- #If the expression is not properly formed, as in
- 2 + 3 * / 7.5
- #then you'll get an error message.
- #....
- —this would produce the error message
- hAWK: syntax error near line 4:
- 2 + 3 * / 7.5
- ^ parse error
- in $tempStdErr. However, be thankful if you GET a message complaining
- about an uncommented comment. Quite often, you'll get no message at
- all—hAWK will quietly execute the comment as though it were part of your
- program (more on this below - see the start of "Common bugs").
-
- Using a hAWK key word or builtin function name as a variable name is also a
- popular error—watch out especially for "in" and "length".
-
- A carriage return or semicolon is sometimes required in a hAWK statement
- to disambiguate the syntax. See the "Grouping and breaking lines" section in
- the "hAWK program structure" chapter of the hAWK User's Manual for the
- details (that's section I 2 in the popup marks menu for the manual).
-
- If you really get stuck trying to diagnose a syntax error, try looking through
- the sample programs supplied with hAWK for similar constructions, as well
- as rereading the relevant manual sections. See also the "Common bugs" section
- below.
-
- ------------------
- It doesn't do what I wanted
- ------------------
- Proofreading helpps
- hAWK programs are so easy to write that there is a strong temptation to
- rush. The fix for most of these problems is to carefully read through your
- new code once before running it. However, we're only human....
-
- Print power
- The best way to diagnose a bug in a program is to get your program to talk to
- you about what it is doing. Your most powerful debugging aid is built in to
- hAWK, and goes by the name of "print". The first rule of hAWK debugging is,
- Print Out What's Really Going On. Many of the suggestions below deal with
- printing out diagnostics.
-
- Make a copy
- If you're debugging an amibitious program, make a copy of it and debug the
- copy. By spinning off one or more versions of your program,you'll be able to
- back up if you angrily delete a stupid chunk of code, only to realise later that
- it was inspired and perfectly correct.
-
- Track your changes
- In a typical debugging session you will be inserting new statements on a trial
- or temporary basis, and also deleting old statements. It can be difficult to
- back up, and easy to get lost in a knot of conditional trials, so the second rule
- of hAWK debugging is Mark Your Changes. To temporarily comment out a
- statement, place "##" in front of it, rather than a single "#". If a new
- statement may not be permanent, place "###" after it. Later, you can
- search for your changes by looking for "##" and "###". Any variation on
- this such as "#@" is perfectly fine; the goal is to be able to spot all of your
- changes at any time by using your "Search" command in your editor.
-
- You can even separate out connected changes by tagging the affected lines with
- "##1" for one group, "##2" for another (the tag goes at the beginning of a
- line for a delete, at the end of the line for a newly-added statement). But
- this is getting a bit complicated, so do it only as a last resort.
-
- Variable values
- To check the value of a variable (say x1), you might as well keep it simple:
- print x1 ###
- will do the job. If you are checking many variables, something like
- print "x1 =", x1, "at line 43" ###
- may be called for.
-
- Variable names
- There is no such thing as an undeclared variable in hAWK. This convenience
- can trip you up, however, if you accidentally misspell the name of a
- variable. There will be no syntax error; the misspelled name will just be
- treated as a different variable. If you suspect a spelling error is the culprit
- but can't spot it, run $WordFrequency using your problem program as the
- input. This will produce a list of all words in your program, making it easier
- to pick out wrong spellings.
-
- Note to have $WordFrequency skip over comments, you can uncomment
- ##/^#/ {next} #skip lines containing hAWK comments
- just after the "BEGIN" block in it.
-
- Assertions
- Assertions are easily checked by adding an "assert" function:
- function assert(expr, message, line)
- {
- if (!expr)
- print "Assertion flunked:", message, "at", line
- }
- with usage such as
- assert(x1 <= 50, "x1 <= 50", 43) ###
- --note that this still prints properly if you leave out the line
- number as in assert(x1 <= 50, "x1 <= 50") - you just won't
- get the line number where the problem occurred.
-
- Assertions the easy way
- For the truly lazy, you could put your assertions in using an
- abbreviated form, such as
- a(x1 <= 50)...
- a(max > 0 && max < 1000) etc
- and then run this little hAWK program, which is in your "hAWK
- programs" folder, on your problem program to fill out the
- assertions (note it's been debugged):
-
- # $ExpandAssertions : see "Debugging hAWK programs"
- # (the tricky bits - treat quotes properly, and
- # avoid using sub(), since the replacement string might contain
- # a "&" which stands for "everything that was matched".)
- #
- # Pass it your problem program as the single input file:
- # overwrites the file, replacing a(assertion) with
- # assert(assertion, "assertion", line number) ###
- FNR == 1 {outfile = FILENAME}
- { if (match($0, /[ \t]*a\((.+)\)/))
- {
- match($0, /\((.+)\)/) #find the argument proper
- first = substr($0, RSTART+1, RLENGTH-2) #copy it
- second = first
- gsub(/"/, "\\\"", second) #escape quotes
- match($0, /[ \t]*/) #match starting white space
- starter = substr($0, RSTART, RLENGTH) #copy it
- $0 = starter "assert(" first ", \"" second "\", " FNR ") ###"
- ##sub(/a\((.+)\)/, "assert(" first ", \"" second "\", " FNR ") ###")
- ##-deleted, doesn't work properly if "second" contains a "&"
- }
- out[++i] = $0
- }
- END { close(outfile)
- for (j = 1; j <= i; ++j)
- print out[j] > outfile
- }
-
-
- This will expand your abbreviations into proper assertions:
- assert(x1 <= 50, "x1 <= 50", ddd) ###...
- assert(max > 0 && max < 1000, "max > 0 && max < 1000",ddd) ### etc
- where ddd is the line number in your program.
-
- Add the "assert" function above to your program too!
-
- Function flow
- Tracing function flow can be done by inserting print statements
- at the start and end of each function, eg:
- function a_func(args...)
- {
- print "Entering:""a_func"
- ...body of function
- print "Leaving:""a_func"
- return something
- }
- though the "Leaving" statements require more care, as your function
- might have several "return" statements. Often, just the "Entering"
- print statements provide enough information for debugging.
-
- Function flow the easy way
- "Entering" print statements can be inserted with the following
- hAWK program (once again your original program will be
- overwritten, so use a copy):
-
- #$EnteringFunction: ad debugging to a program with functions,
- # inserting print statements at beginning of each function.
- # Pass it your problem program as the single input file:
- # overwrites the file, so use a copy.
- FNR == 1 {outfile = FILENAME}
- {
- if (match($0, /^[ \t]*func/)) #start of function definition
- {
- name = $2
- len = index(name, "(")
- if (len+0 > 1)
- name = substr(name,1,len-1)
- out[++i] = $0
- # Skip over opening left curly of function
- if ($0 !~ /{/)
- {
- do
- {
- getline
- out[++i] = $0
- } while ($0 !~ /{/);
- }
- out[++i] = "print \"Entering: \"\"" name "\" ###"
- }
- else
- out[++i] = $0
- }
- END { close(outfile)
- for (j = 1; j <= i; ++j)
- print out[j] > outfile
- }
-
- You'll find this program in your "hAWK programs" folder.
-
- Sending diagostics to stderr
- If your program writes to stdout and you'd rather redirect your
- output to a different file to make things easier to read, then instead
- of just a plain "print" for your debugging
- print "Debugging or error message"
- you can use
- print("Debugging or error message") > "stderr"
- This will send diagnostics to the file $tempStdErr. The parenthesized
- form of the print statement should be used to make it clear to the
- interpreter that ">" means "redirect", not "greater than".
-
- For example, to print assertions to $tempStdErr you could use
- function assert(expr, message, line)
- {
- if (!expr)
- print ("Assertion flunked:", message, "at", line) > "stderr"
- }
- And to send function flow messages to stderr, replace the appropriate
- line in $EnteringFunction with
- out[++i] = "print( \"Entering: \"\"" name "\") > \"stderr\" ###"
-
- The $tempStdErr file will not be opened for you automatically after a run,
- so remember to open it and take a look if you send messages there.
-
- hAWK isn't C
- hAWK declares variables for you, happily concatenates just about anything
- with anything, accepts functions with a variable number of arguments,
- doesn't mind if you use a name as a number, a string, AND an array
- all in the same program, and just loves to print the current record to
- stdout unless you say otherwise. To some extent this is "too much of a
- good thing", and it certainly takes getting used to. The "Common bugs"
- section below is mostly a list of things that hAWK does differently from
- C, and a quick browse through will help you avoid programming in the
- wrong language.
-
- ----------
- Common bugs
- ----------
- Uncommented comment: if you just can't find the bug, reread your program
- and look for a line that should be a comment but isn't. Forgetting to put
- the "#" at the start of a comment doesn't always cause a syntax error;
- often hAWK will execute the text as though it were code without error,
- and if the text involves any variables you use in your program then
- odd things can happen. Typical symptoms are that lines are printed to
- stdout that you didn't expect, or you can't seem to set the value of a
- variable.
- Watch for things like
- #Setting
- x = -1
- #will shut off all progress dialogs,
- #for quiet running.
- --here, "x = -1" would be interpreted as a pattern; it would always
- evaluate to nonzero, so all lines of input would be printed to stdout
- (the default action if no action is given).
- or
- x = -1; x = 0 will enable dlogs
- --here, x would have the value "0", as a result of concatenating "0"
- with the (presumably) unassigned variables "will", "enable", and "dlogs",
- overriding the previous "x = -1" assignment. Strange, but true.
-
- Spelling error: the drawback of not having to declare variables in hAWK is
- that a spelling error can accidentally create a new variable. For example,
- if (maxLines > 50)
- maxlines = 50
- would never set "maxLines" - it would create a new variable "maxlines"
- and set it instead. If you suspect a spelling error is the culprit but can't
- spot it, run $WordFrequency using your problem program as the input.
- This will produce a list of all words in your program, making it easier
- to pick out wrong spellings.
-
- Unintentional redirection: "getline" returns 0 at end of file, -1 if there
- is a problem reading the file, and 1 if all is OK. So what does
- if(getline < 0)....
- do? It attempts to read from the file named "0", and usually doesn't succeed.
- If you want to check that getline is not returning -1, use
- if ((getline) < 0)...
- Even better, use
- if(getline <= 0)....
- or
- while (getline > 0)....
- instead.
-
- Endless getline loop: "getline" returns 0 only if it has successfully reached
- the end of a file. If there is a problem opening or reading a file, "getline"
- returns -1. So,
- while (getline < theFIle)....
- will loop forever if it has trouble reading a file. Use
- while (getline < theFile > 0)....
- instead.
-
- Unassigned variable: sometimes you may wish to safeguard against forgetting
- to assign a value to a variable, or to give it a default value if no value was
- ever assigned. The tests for this are:
- if: then this test is true:
- --------------- -----------------------
- x is unassigned if (x == "" && x == 0)
- x = anything if (x != 0 || x != "")
-
- For example,
- if (find == "" && find == 0)
- print "Oops, forgot to set \"find\"."
- More commonly, you'll just be interested in whether a variable has
- a non-null value, for which the test
- if (x == "")
- will do.
-
- Comparing string with number: in a comparison such as
- if (x == y) or if (x >= y) etc
- x and y are compared as strings unless both x and y are numbers.
- To force the comparison to be done with the numeric values of the
- variables, add 0 to each, as in
- if (x+0 == y+0) or if (x+0 >= y+0).
- Conversely, to force the comparison to be done with the string values,
- concatenate at least one with the null string, as in
- if (x "" == y) or if (x >= y "").
-
- Global instead of local: if you forget to declare a local variable, or misspell
- the name of a local variable, your variable will be global rather than
- local. It will not be initialized to zero each time you call the function, and
- if it is used elsewhere as a global then changing its value in one place will
- affect the other place. $WordFrequency helps with misspellings (see above).
-
- Function returns garbage: if you forget to return something from a function
- then the returned value of the function will be garbage. If your function
- should return something, check that it does so in all possible cases.
-
- Patterns are not mutually exclusive: more of a design problem than a bug,
- the problem here is to execute one action if some complicated test is true,
- and do some other action otherwise. To do it, put all the complicated testing
- inside one action, so you can use an "if-else" construction.
- All patterns in your program are executed for each new input line, unless
- you do one of:
- "next", which retrieves the next input line and starts the pattern
- matching over with your first pattern;
- "getline", which retrieves the next input line to $0 (if no variable
- is specified) but doesn't jump you back to the first pattern;
- "exit" which skips to your END statements or exits immediately.
-
- Exit doesn't exit: an "exit" inside an END block does truly and immediately
- exit. An exit anywhere else passes control to your END statements if there
- are any, so if you need to do an immediate exit in a program that contains
- an END block use something like
- ...
- if (should immediately exit)
- {
- exit_now = 1
- exit
- }
- ...
- END {
- if (exit_now == 1)
- exit
- ...normal END actions
- }
-
- Regular expression matches too much: regular expressions try to match
- as much as possible - for example, /A.*B/ will match everything between
- the first "A" on the line and the last "B" on the line, even if there are 5 B's
- in between. To match from A to the first B, use /A[^B]*B/.
-
- Matching C identifiers: "\w" in a regular expression is the same as
- "[A-Za-z0-9]" - note it doesn't include the underscore. To match a C
- name, use "[A-Za-z_][A-Za-z0-9_]*" or the equivalent
- "[A-Za-z_](\w|_)*". Using "[A-Za-z0-9_]+" will work provided
- you don't mind catching integers such as "0" or "317" as well as names.
-
- Missing array index: consider the fragment
- x[1]= 17
- x = "huh?"
- x[2] = "hi"
- print x[1], x[2], x
- -will this run? You bet - a variable and an array can have the same name,
- with no interference between the two. The above fragment will print
- 17 hi huh?
- Needless to say, this "feature" should be avoided.
-